{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \"LOG->使用PlaidML进行GPU模型训练\"\n",
    "> \"我的RX480终于可以跑模型了\"\n",
    "\n",
    "- toc: true\n",
    "- badges: true\n",
    "- comments: true\n",
    "- categories: [keras,deeplearning,log]\n",
    "- image: images/posts/2020-02-25-using-plaidml-keras-to-ai/PlaidML.png"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 关于PlaidML\n",
    "[PlaidML](https://github.com/plaidml/plaidml)是Intel的一个AI开发工具，目前支持Keras, ONNX,nGraph。  \n",
    "现在大火的Tensorflow和PyTorch只支持Nvidia的CUDA进行GPU加速计算。  \n",
    "而PlaidML可使用OpenCL进行加速。虽然AMD有自己的加速运算平台[ROCm](https://rocm.github.io/)，但目前不支持windows系统，而且OpenCL在速度上貌似还比不上CUDA，对A卡Windows用户（就是我）来说但总比没有的好。\n",
    "\n",
    "本文使用的机器主要配置如下：\n",
    "- E3 1230 v2\n",
    "- RX 480\n",
    "- DDR3 1333 4G x2\n",
    "\n",
    "下面从安装到跑模型，来试试PlaidML的效果如何。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装PlaidML\n",
    "用conda创建一个新环境，在其中使用pip安装PlaidML: \n",
    "> conda create -n plaidml  \n",
    "> conda activate plaidml  \n",
    "> pip instal -U plaidml-keras  \n",
    "> plaidml-setup  \n",
    "\n",
    "根据提示设置PlaidML。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用PlaidML训练Fashion-MNIST分类器\n",
    "首先用tensorflow中的keras跑一下，看看要跑多久。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": "Train on 60000 samples\nEpoch 1/10\n60000/60000 [==============================] - 46s 769us/sample - loss: 0.5855 - accuracy: 0.7840\nEpoch 2/10\n60000/60000 [==============================] - 46s 758us/sample - loss: 0.4013 - accuracy: 0.8541\nEpoch 3/10\n60000/60000 [==============================] - 46s 762us/sample - loss: 0.3586 - accuracy: 0.8695\nEpoch 4/10\n60000/60000 [==============================] - 47s 777us/sample - loss: 0.3357 - accuracy: 0.8769\nEpoch 5/10\n60000/60000 [==============================] - 47s 778us/sample - loss: 0.3172 - accuracy: 0.8826\nEpoch 6/10\n60000/60000 [==============================] - 44s 736us/sample - loss: 0.3010 - accuracy: 0.8888\nEpoch 7/10\n60000/60000 [==============================] - 44s 735us/sample - loss: 0.2867 - accuracy: 0.8940\nEpoch 8/10\n60000/60000 [==============================] - 44s 736us/sample - loss: 0.2781 - accuracy: 0.8974\nEpoch 9/10\n60000/60000 [==============================] - 44s 737us/sample - loss: 0.2697 - accuracy: 0.9013\nEpoch 10/10\n60000/60000 [==============================] - 44s 735us/sample - loss: 0.2631 - accuracy: 0.9033\n10000/10000 [==============================] - 2s 201us/sample - loss: 0.2369 - accuracy: 0.9120\ntraining time cost: 451.5 s, accuracy: 0.91\n"
    }
   ],
   "source": [
    "#collapse\n",
    "# 使用tensorflow.keras(cpu)训练\n",
    "import tensorflow as tf\n",
    "from time import time\n",
    "\n",
    "data = tf.keras.datasets.fashion_mnist\n",
    "(x_train, y_train), (x_test, y_test) = data.load_data()\n",
    "\n",
    "x_train = x_train.astype('float32').reshape(-1, 28, 28, 1) / 255.\n",
    "x_test = x_test.astype('float32').reshape(-1, 28, 28, 1) / 255.\n",
    "# print(x_train.shape)\n",
    "\n",
    "model = tf.keras.Sequential([\n",
    "    tf.keras.layers.Conv2D(\n",
    "        filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28, 28, 1)\n",
    "    ),\n",
    "    tf.keras.layers.MaxPool2D(pool_size=2),\n",
    "    tf.keras.layers.Dropout(0.3),\n",
    "    tf.keras.layers.Conv2D(\n",
    "        filters=32, kernel_size=2, padding='same', activation='relu'\n",
    "    ),\n",
    "    tf.keras.layers.MaxPool2D(pool_size=2),\n",
    "    tf.keras.layers.Dropout(0.3),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(units=256, activation='relu'),\n",
    "    tf.keras.layers.Dropout(0.5),\n",
    "    tf.keras.layers.Dense(units=10, activation='softmax')])\n",
    "model.compile(\n",
    "    optimizer='adam', \n",
    "    loss=tf.keras.losses.sparse_categorical_crossentropy,\n",
    "    metrics=['accuracy'])\n",
    "\n",
    "train_start = time()\n",
    "model.fit(x_train, y_train, batch_size=64, epochs=10)\n",
    "train_end = time()\n",
    "_, accuracy = model.evaluate(x_test, y_test)\n",
    "print('training time cost: {0:.1f} s, accuracy: {1:.4f}'.format(train_end-train_start, accuracy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在轮到本文的主角plaidml，展现真正的技术了~~（不是~~\n",
    "> Note: plaidml和tensorflow都有keras，不同的是使用的后端。如果没有使用`conda`进行环境分隔的话，要将keras的后端切换到plaidml，才能确保plaidml正确运行。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#collapse_show\n",
    "# 更改keras后端\n",
    "import plaidml.keras\n",
    "plaidml.keras.install_backend()\n",
    "import os\n",
    "os.environ['KERAS_BACKEND'] = 'plaidml.keras.backend'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Tip: 如果不想每次都运行上述代码，可在`%USERPROFILE%\\.keras\\keras.json`配置文件中的`\"backend\"`设为 `\"plaidml.keras.backend\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": "Using plaidml.keras.backend backend.\nINFO:plaidml:Opening device \"opencl_amd_ellesmere.0\"\nEpoch 1/10\n60000/60000 [==============================] - 22s 374us/step - loss: 1.1562 - acc: 0.7010\nEpoch 2/10\n60000/60000 [==============================] - 12s 193us/step - loss: 0.5358 - acc: 0.8014\nEpoch 3/10\n60000/60000 [==============================] - 12s 195us/step - loss: 0.4548 - acc: 0.8316\nEpoch 4/10\n60000/60000 [==============================] - 12s 197us/step - loss: 0.4207 - acc: 0.8457\nEpoch 5/10\n60000/60000 [==============================] - 11s 185us/step - loss: 0.3989 - acc: 0.8551\nEpoch 6/10\n60000/60000 [==============================] - 11s 186us/step - loss: 0.3773 - acc: 0.8625\nEpoch 7/10\n60000/60000 [==============================] - 11s 185us/step - loss: 0.3664 - acc: 0.8651\nEpoch 8/10\n60000/60000 [==============================] - 11s 187us/step - loss: 0.3551 - acc: 0.8713\nEpoch 9/10\n60000/60000 [==============================] - 11s 184us/step - loss: 0.3425 - acc: 0.8758\nEpoch 10/10\n60000/60000 [==============================] - 11s 183us/step - loss: 0.3355 - acc: 0.8763\n10000/10000 [==============================] - 5s 473us/step\ntraining time cost: 124.2 s, accuracy: 0.8858\n"
    }
   ],
   "source": [
    "#collapse\n",
    "# 使用plaidml.keras(gpu) 训练\n",
    "import keras\n",
    "from time import time\n",
    "\n",
    "(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
    "x_train = x_train.astype('float32').reshape(-1, 28, 28, 1)\n",
    "x_test = x_test.astype('float32').reshape(-1, 28, 28, 1)\n",
    "\n",
    "model = keras.Sequential([\n",
    "    keras.layers.Conv2D(\n",
    "        filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28, 28, 1)\n",
    "    ),\n",
    "    keras.layers.MaxPool2D(pool_size=2),\n",
    "    keras.layers.Dropout(0.3),\n",
    "    keras.layers.Conv2D(\n",
    "        filters=32, kernel_size=2, padding='same', activation='relu'\n",
    "    ),\n",
    "    keras.layers.MaxPool2D(pool_size=2),\n",
    "    keras.layers.Dropout(0.3),\n",
    "    keras.layers.Flatten(),\n",
    "    keras.layers.Dense(units=256, activation='relu'),\n",
    "    keras.layers.Dropout(0.5),\n",
    "    keras.layers.Dense(units=10, activation='softmax')])\n",
    "model.compile(\n",
    "    optimizer='adam', \n",
    "    loss=keras.losses.sparse_categorical_crossentropy,\n",
    "    metrics=['accuracy'])\n",
    "train_start = time()\n",
    "model.fit(x_train, y_train, batch_size=64, epochs=10)\n",
    "train_end = time()\n",
    "_, accuracy = model.evaluate(x_test, y_test)\n",
    "print('training time cost: {0:.1f} s, accuracy: {1}'.format(train_end-train_start, accuracy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过上面两次训练对比看到，CPU训练需要451秒，而通过PlaidML使用GPU训练则只需124秒，大概缩短了2/3的时间，效果还是很明显的。    \n",
    "综上，PlaidML适合没有N卡但坚守Windows，以及MacBook Pro的用户。但有条件还是搞一台N卡主机吧，"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 番外：为什么不问问神奇的Colab呢？\n",
    "不试不知道，试了才知道。上面代码在使用了GPU的Colab下跑，结果输出：\n",
    "\n",
    "> training time cost: 50.0 s, accuracy: 0.9147  \n",
    "  \n",
    "\n",
    "最后知道真相的我眼泪流下来，手上PlaidML突然就不香了。\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "version": "3.7.4-final"
  },
  "orig_nbformat": 2,
  "file_extension": ".py",
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "version": 3,
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}